Goto

Collaborating Authors

 data optimization


Collaborative Unlabeled Data Optimization

arXiv.org Artificial Intelligence

This paper pioneers a novel data-centric paradigm to maximize the utility of unlabeled data, tackling a critical question: How can we enhance the efficiency and sustainability of deep learning training by optimizing the data itself? We begin by identifying three key limitations in existing model-centric approaches, all rooted in a shared bottleneck: knowledge extracted from data is locked to model parameters, hindering its reusability and scalability. To this end, we propose CoOpt, a highly efficient, parallelized framework for collaborative unlabeled data optimization, thereby effectively encoding knowledge into the data itself. By distributing unlabeled data and leveraging publicly available task-agnostic models, CoOpt facilitates scalable, reusable, and sustainable training pipelines. Extensive experiments across diverse datasets and architectures demonstrate its efficacy and efficiency, achieving 13.6% and 6.8% improvements on Tiny-ImageNet and ImageNet-1K, respectively, with training speedups of $1.94 \times $ and $1.2 \times$.


ADO: Automatic Data Optimization for Inputs in LLM Prompts

arXiv.org Artificial Intelligence

This study explores a novel approach to enhance the performance of Large Language Models (LLMs) through the optimization of input data within prompts. While previous research has primarily focused on refining instruction components and augmenting input data with in-context examples, our work investigates the potential benefits of optimizing the input data itself. We introduce a two-pronged strategy for input data optimization: content engineering and structural reformulation. Content engineering involves imputing missing values, removing irrelevant attributes, and enriching profiles by generating additional information inferred from existing attributes. Subsequent to content engineering, structural reformulation is applied to optimize the presentation of the modified content to LLMs, given their sensitivity to input format. Our findings suggest that these optimizations can significantly improve the performance of LLMs in various tasks, offering a promising avenue for future research in prompt engineering. The source code is available at https://anonymous.4open.science/r/ADO-6BC5/


Data Pipeline Training: Integrating AutoML to Optimize the Data Flow of Machine Learning Models

arXiv.org Artificial Intelligence

Data Pipeline plays an indispensable role in tasks such as modeling machine learning and developing data products. With the increasing diversification and complexity of Data sources, as well as the rapid growth of data volumes, building an efficient Data Pipeline has become crucial for improving work efficiency and solving complex problems. This paper focuses on exploring how to optimize data flow through automated machine learning methods by integrating AutoML with Data Pipeline. We will discuss how to leverage AutoML technology to enhance the intelligence of Data Pipeline, thereby achieving better results in machine learning tasks. By delving into the automation and optimization of Data flows, we uncover key strategies for constructing efficient data pipelines that can adapt to the ever-changing data landscape. This not only accelerates the modeling process but also provides innovative solutions to complex problems, enabling more significant outcomes in increasingly intricate data domains. Keywords- Data Pipeline Training;AutoML; Data environment; Machine learning


Data Optimization in Deep Learning: A Survey

arXiv.org Artificial Intelligence

Large-scale, high-quality data are considered an essential factor for the successful application of many deep learning techniques. Meanwhile, numerous real-world deep learning tasks still have to contend with the lack of sufficient amounts of high-quality data. Additionally, issues such as model robustness, fairness, and trustworthiness are also closely related to training data. Consequently, a huge number of studies in the existing literature have focused on the data aspect in deep learning tasks. Some typical data optimization techniques include data augmentation, logit perturbation, sample weighting, and data condensation. These techniques usually come from different deep learning divisions and their theoretical inspirations or heuristic motivations may seem unrelated to each other. This study aims to organize a wide range of existing data optimization methodologies for deep learning from the previous literature, and makes the effort to construct a comprehensive taxonomy for them. The constructed taxonomy considers the diversity of split dimensions, and deep sub-taxonomies are constructed for each dimension. On the basis of the taxonomy, connections among the extensive data optimization methods for deep learning are built in terms of four aspects. We probe into rendering several promising and interesting future directions. The constructed taxonomy and the revealed connections will enlighten the better understanding of existing methods and the design of novel data optimization techniques. Furthermore, our aspiration for this survey is to promote data optimization as an independent subdivision of deep learning. A curated, up-to-date list of resources related to data optimization in deep learning is available at \url{https://github.com/YaoRujing/Data-Optimization}.


ERP Systems: How It Benefits From Artificial Intelligence - AI Summary

#artificialintelligence

AI can help further ameliorate this aspect of the business in more than one impact way; for example, it can assist companies with data optimization, i.e. ensuring all their data is not only updated but also optimized and complete. ERP solutions fortified with AI are also able to help companies close any gaps between various departments within the organization, empower executives to make sound, data-driven decisions, and so much more. AI-driven ERP solutions, then, offer solutions such as chatbots that can quickly learn from the company's data and then use it to improve customers' journeys and experiences with the brand. Plus, when you find a trusted provider for enterprise software development services, you will also have the requisite expertise that will further serve to ensure the success of your endeavor to fortify your ERP solution with AI. Not only that -- researchers have also found that as many as 83 percent of companies believe AI is critical to the success of their endeavors to ensure their business growth.


How to Bring Your ML Models to Production Faster

#artificialintelligence

After different AI projects, I realized how quickly building efficient Machine Learning models is becoming a core competency for companies to compete more effectively. Decision-makers are learning that managing the whole lifecycle of building, deploying, and debugging models within their existing tech stack is not straightforward and brings a new set of challenges. Based on my experience, data scientists often spend time analyzing a dataset, look for suitable algorithms, train a new model, then hand it over to data engineers to run in production. This separation can lead to problems where data scientists don't see the challenges of running the model in production, and data engineers don't know how the models are structured. I have seen many times data scientists writing applications that don't scale in production.


AI Series: Part 1 - AI vs. Machine Learning vs. Deep Learning

#artificialintelligence

How smart are you when it comes to the nuances of Artificial Intelligence (AI)? When you read about the future of AI, it can seem like there are a lot of buzzwords being thrown around in the media. Differentiating among AI, machine learning and deep learning technologies can be confusing, especially when terms are being used interchangeably. Let's begin by clearing things up with a few definitions. First, there is AI, which refers to intelligence exhibited by machines in the form of human cognitive functions like visual perception, speech recognition, decision-making, and language translation.